Managed Table

The default kind of table in Hive is manage or internal table where files, metadata and statistics are managed by internal Hive processes. A managed table is stored under the hive.metastore.warehosue.dir path property. By default in a folder path similar to /user/hive/warehouse/databasename.db/tablename/.The default location can be overridden by the location property during table creation.
  • With external tables, when you drop a table, just the metadata gets deleted and the data exist on the cluster.
  • You can create data in internal tables by issuing INSERT or LOAD DATA statements. 
  • Hive creates a directory in HDFS to hold the data files.
  • When you issue a DROP TABLE statement, hive physically removes all the data files from the directory.
  • To see whether a table is internal or external, and its associated HDFS location, issue the statement DESCRIBE FORMATTED table_name.The Table Type field displays MANAGED_TABLE for internal tables and EXTERNAL_TABLE for external tables.The location field dispaly the path of the table directory as an HDFS URL. 
  • When you issue an ALTER TABLE statement to rename an internal table, all data files are moved into the new HDFS directory for the table.

Employee Table data

7369,SMITH,CLERK,7902,17-DEC-1980,800,100,20
7499,ALLEN,SALESMAN,7698,20-FEB-1981,1600,300,30
7521,WARD,SALESMAN,7698,22-FEB-1981,1250,500,30
7566,JONES,MANAGER,7839,2-APR-1981,2975,100,20
7654,MARTIN,SALESMAN,7698,28-SEP-1981,1250,1400,30
7698,BLAKE,MANAGER,7839,1-MAY-1981,2850,100,30
7782,CLARK,MANAGER,7839,9-JUN-1981,2450,100,10
7788,SCOTT,ANALYST,7566,09-DEC-1982,3000,100,20
7839,KING,PRESIDENT,4561,17-NOV-1981,5000,100,10
7844,TURNER,SALESMAN,7698,8-SEP-1981,1500,10,30
7876,ADAMS,CLERK,7788,12-JAN-1983,1100,100,20
7900,JAMES,CLERK,7698,3-DEC-1981,950,100,30
7902,FORD,ANALYST,7566,3-DEC-1981,3000,100,20
7934,MILLER,CLERK,7782,23-JAN-1982,1300,100,10

Create Employee Table

create table emp
(
   empno int,
   ename varchar(20),
   job varchar(20),
   mgr int,
   hiredate string,
   sal float,
   comm float,
   deptno int
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' STORED AS TEXTFILE

Load data into manage Table

  • Load data from local file system
load data local inpath '/home/cloudera/data/emp.txt' into table emp;
  • Load data from Hadoop file system
load data local inpath '/home/cloudera/data/emp.txt' into table emp;
  • Overwirte the existing data into table(optional)
load data local inpath '/home/cloudera/data/emp.txt' OVERWRITE into table emp; 

Insert Command

The insert command is used to load the data Hive table. Inserts can be done to a table or a partition.
  • INSERT OVERWRITE is used to overwrite the existing data in the table or partition.
  • INSERT INTO is used to append the data into existing data in a table. (Note: INSERT INTO syntax is work from the version 0.8).

How to convert managed table to external table.

  • alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
 Convert to an internal table to an external table:
  • ALTER TABLE <tablename> SET TBLPROPERTIES('EXTERNAL'='TRUE');
Convert to an external table to an internal table:
  • ALTER TABLE <tablename> SET TBLPROPERTIES('EXTERNAL'='FALSE');
To Skip the table header
  • TBLPROPERTIES ("skip.header.line.count" = "1");

No comments:

Post a Comment